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ABSTRACT 



The scatter? lot is one of the most powerful and most used statistical tools. With only a small 
amount of additional effort, the visual information can be greatly increased by plotting another set of 
points whose purpose is to summarize some aspect of the scatterplot. 



1. INTRODUCTION 

Figure I shows a scatterplot of points (x,,.V # ), for 
/ — 1 /i, where n — 50. In Figure 2 the same scatter- 
plot is summarized by another set of points (x,,.?,), for 
/ — 1 ,...,/!, which are plotted by joining successive values 
by straight lines. The point (x,,.p,) portrays the middle of 
the distribution of the variable on the vertical axis, Y, 
given the value of the variable on the horizontal axis, 
X *» x h The formation of the new points will be referred 
to as "smoothing" the scatterplot. The point (x h y) is 
called the smooth at x, and y, is called the fitted value at x r 

The example in Figure I was generated by taking 
x, « t\ for / — 1 50 and 

y f - .02 x ; + €, , 

where the e,- are a random sample from a normal distribu- 
tion with mean 0 and variance 1. The linear effect is not 
easily perceived from the scatterplot alone, but is revealed 
when the smooth is superimposed. 

In this paper we shall discuss a method for smoothing 
scatterplots called robust locally weighted regression. The 
details of the method are given in Section 2. Various visual 
considerations and alternative plotting procedures are dis- 
cussed in Section 3 and computational matters are discussed 
in Section 4. References [11, 12], (3, p. 2251, [41, [5, 
Chapters 8 and 9], and [61 describe other methods for 
smoothing scatterplots. 

2. ROBUST LOCALLY WEIGHTED REGRESSION 

The method of smoothing used in Figure 2, which is 
called robust locally weighted regression, is defined by the 
following sequence: 

(1) Let 

W(x) - (l-|x| 3 ) 3 /(x) 



where /Or) - 1 if |x| < 1 and I(x) - 0 if |x| > 1. 
Let 

BW-il-xVKx) . 

(2) For each i let h, be the distance from x, to the r- 
th nearest neighbor of x h That is hj is the r-th smallest 
number among |x,— xj, for j ™ 1 For k — \,...,n 
let 



(3) For each /compute j3 0 (x,) and P\(Xj) % the inter- 
cept and slope respectively, of a linear regression of y k on 
x k using weighted^ least squares with weight w*(x ( ) at 
(x k ,y k ). That is, /9 0 (x f -) and 0i(x,) are the values of 0 O 
and /?i which minimize 



Z w * W <A- -13 ,x fc ) 2 



Let 



V/ = 0 o (x,) + j3,(x,)x, 
be the fitted value of the line at x . 

(4) Let 

e, - y, - v, 

be the residuals from the current fitted values. Let s be the 
median of the Define robustness weights by 

(5) Recompute y t for each / by fitting a line using 
weighted least squares with weight h k w k (xii at (x k ,y k ), 

(6) Repeatedly carry out steps (4) and (5) a total of 
one, two, or three times or until convergence occurs. The 
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final y f are robust locally weighted regression fitted values. 

The weights w k (x § ) decrease as the distance of x k 
from Xi increases. Thus points whose abscissas are close to 
x, play a large role in the determination of y, while points 
far away play little or no role. Increasing r, the number of 
nearest neighbors, tends to increase the smoothness of the 
smoothed points (x h y^. Choosing r to be 20 to 80 of n 
should serve most purposes. A practical default value, 
used in the example of Figures 1 and 2 is .Sn. The "tri- 
cube" weight function, W(x), is used to weight neighbors 
since it results in certain desirable statistical properties [7]. 

The iterative fitting in steps (4) to (6) is carried out 
to achieve a robust smooth in which a small fraction of 
deviant points does not distort the results. Deviant points 
tend to have small robustness weights, h k) and therefore do 
not play a large role in the determination of the smoothed 
values. The "bisquare" weight function, 2Kx), is used 
since other investigations have shown it to perform well for 
robust estimation of location [8] and for robust regression 
(91. Two iterations of steps (4) and (5) are generally quite 
sufficient; this is the number used in the example of Fig- 
ures 1 and 2. 

3. VISUAL CONSIDERATIONS 

3.1 Plotting the Smooth 

The smoothed points can be plotted by joining suc- 
cessive points by straight lines as in Figure 2 or by symbols 
at the poiats When the smooth is superimposed on 

the scatterpiot the first method provides greater visual 
discrimination with the points of the scatterpiot. But using 
lines raises the danger of an inappropriate interpolation. 
One possible approach is to use symbols initially when 
analyzing the data; then if a particular plot is needed for 
further use, such as presentation to others, the lines can be 
used if the initial plot indicates that linear interpolation 
would not lead to a distortion of the results. 

The smoothed values also can be plotted on a 
separate grid with the same scales as the original scatterpiot. 
This is particularly attractive for low resolution plots such 
as printer plots. 

3.2 Symmetric Summaries 

The method of summarizing the scatterpiot in Section 
2 is appropriate when Y is the response or dependent vari- 
able and X\s the independent variable. In cases where nei- 
ther variable can be designated as the response, the scatter- 
piot can be summarized by plotting the smooth of Y given 
X and the smooth of X given K 

3.3 Summarizing Scale and Choosing a Scale Stabilizing 
Transformation 

The smoothed points in Figure I portray the location 
of the distribution of Y given X - x,. It is often useful to 
have, in addition, a summary of the scale. This can be 
done by plotting against x, and computing and 

plotting a smooth, (x,-J,-), of this scatterpiot. 

If the scale of y, is a function, <t(ja), of the location 
of yj then the transformation of y, which stabilizes the scale 



(10, p. 425] is 



,-;-!-. 

Suppose t is a power transformation 



P 

lOg/A 



p *0 
P-0 



Tukey [5, p. 103] has suggested a procedure for choosing a 
scale stabilizing power transformation for batches of 
numbers, which can be extended to choosing one for scat- 
terplots. From the above equations we have 

log <r(p) - -log /'(/i) - ~(P -Dlog p. . 

A plot of h vs. $j describes the function o-(/x). Thus a 
plot of log s, vs. —log y t will, apart from sampling fluctua- 
tions, follow a line with slope p —1. Thus p can be chosen 
by fitting a line to the plot, either by eye or by some 
numerical method. 

3.4 Judging the Amount of Smoothing 

The most practical method for choosing r, the 
number of nearest neighbors, is to study the visual display. 
The objective is to choose r as large as possible without dis- 
torting the underlying pattern in the scatterpiot 

The fitted value, y h in step (3) of Section 2 can be 
written as 

n 

y* - 21 r My k . 

where r k (x) depends only on Xi, . . . ,x„. The equivalent 
number of parameters 

n n 

enp - 2 X r t( x i> " £ rH x i) 
/-l u-i 

also can be used to judge the relative amounts of smooth- 
ing for different values of r. An interpretation of enp arises 
from considering the variability in the residuals, e h of the 
fitted values in (3). Suppose the yt are independent and 
have common variance or 2 then 



enp 



n — 



E £ e? 



CT" 



Suppose the e, were the residuals from a linear least 
squares regression of y, on x, using q parameters, then enp 
would be equal to q. Thus enp for locally weighted regres- 
sion can be interpreted as an equivalent number of parame- 
ters. 

In [7] it is shown that enp can be approximated 
(when the weight function is trie u be) by 

2(1 + -) . 
r 

Thus for the default value of — = .5 described in Section 

n 

2, the approximate equivalent number of parameters is 6. 
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3 J Nearest Neighbor vs. Equal Resolution 

An alternative to choosing h f through nearest neigh- 
bors is to use a constant value h for the computation of all 
fitted values. This provides equal resolution over all 
regions of the scatterplot but leads to appreciable increases 
in the variance at isolated points and at the ends of the 
scatterploL The selection of the nearest neighbor routine is 
based on its more satisfactory performance, particularly at 
the ends of the scatterplot, for the applications which I 
have encountered. However, other users may find equal 
resolution more satisfactory in other applications. 

4. COMPUTATIONAL CONSIDERATIONS 

4.1 Reducing the Computations 

Suppose the x f are ordered from smallest to largest 

and let x a{i) x bU) be the ordered r nearest neighbors 

of x,. The values of a(/+l) and can be found 

from a(i) and b(i) using the following scheme: 

(1) Let/4 - a(i) and B - b(i). 
12) Let d A - Xj+x ~ x A and d 8 - x B + x - x (>1 . 
(4) a. If d A < d B then fl(/+l) - A and 
bii+l) - a 

b. If d A > d B replace A by A +1 and B by 
B+\ and return to (2). 

(4) h f +i is the maximum of x,+| — x A and 
x B — x, +1 . 

Thus this scheme can be used to save computations by 
computing the fitted values at x t , then x 2 , etc * 0nl y 
x ain . . . . ,x b{f ) need be considered in the weighted least 
squares computation of y, since W(x) — 0 for |x| > 1. 
This saving would not be achieved by using a weight func- 
tion which becomes small but not zero for large x, such as 
the normal probability density. 

4.2 Computation Time 

An experiment was run to determine the run time of 
the smooth using the scheme in Section 4.1 for the nearest 
neighbor algorithm in Section 2 with one iteration. (The 
portable FORTRAN routine is available from the author.) 
Each additional iteration would increase the time by slightly 
less than 50 of the time required for one iteration. Con- 
sideration of the algorithm shows that the run time is 
independent of the configuration of the x,. (This would 
not be true for the equal resolution algorithm.) The experi- 
ment was run for all 20 combinations of 5 values of 
/ - r/n (.2, .4, .6, .8, 1.0) and 4 values of n (25, 50, 100, " 
and 200). A least squares fit to the log (base 10) run time 
(cpu milieseconds) resulted in the fitted equation 

log time - -.49 + 1.98 log n + .87 log / . 

The estimate of the residual standard error is the standard 
error of the tog limes is .718 log milieseconds. Thus the 
equation provides a very close fit to the data. 



4.3 Thinning 

The computations for the nearest neighbor algorithm 
are, as shown in the previous section, approximately of the 
order f' 9 n\ For scatterplots with fewer than 50 to 100 
points the computations present no problems. Plots with 
more points generally need not incur the cost of using all 
the points since computing the smooth at a subset of the 
points will generally perform satisfactorily. (The smooth 
can, of course, still be superimposed on the full scatter- 
plot.) 

Two possible methods of thinning are to select every 
Mh value of the ordered x, or to form a grid of equally 
spaced points on the horizontal axis and select, for each 
grid value, the x, which is closest to the value. 

4.4 Locally Weighted Regression of Order d 

Steps (3) and (5) of the procedure in Section 2 can 
be generalized by fitting a polynomial of degree a\ where d 
is a non-negative integer. Choosing d — 1 appears to 
strike a good balance between computational ease and the 
need for flexibility to reproduce patterns in the data. The 
case rf»0 is the simplest, computationally, but in the 
practical situation an assumption of local linearity seems to 
serve far better than an assumption of local constancy since 
the practice is to plot variables which are related to one 
another. For d ■* 2, however, computational considera- 
tions begin to override the need for having flexibility. 
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